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Detailed Description 

claims 

Fulltext word Count: 8076 
English Abstract 

A system for determining the start of a match of a regular expression 



includes a special state table that contains start entries and terminal 
entries, and a set of start state registers for holding offset 
information. The system further includes a DFA next state table that, 
given the current state and an input character, returns the next state. A 
settable indicator is included in the DFA next, state table corresponding 
to each next state table entry which indicates whether to perform a 
lookup in the special state table. A compiler loads values into the 
special state table based on the regular expression. A method for 
determining the start of a match of a regular expression using the 
special state table, the set of start state registers and the DFA next 
state table, includes the step of determining from the regular expression 
each start-of match start state and each end-of-match terminal state. For 
each start state, a start state entry is loaded into the special state 
table. For each terminal state, a terminal state entry is loaded into 
each special state table. The next state table is used to return the next 
state from the current state and an input character, when a start state 
is encountered, the current offset from the beginning of the input 
character string is loaded into the start state register, when a terminal 
state is encountered, the terminal state entry is retrieved from the 
special state table, and the value of the start state register 
corresponding to the rule number of the terminal entry in the special 
state table is further retrieved. The value of the start state register 
which is retrieved indicates the location in the character string where 
the start-of -match occurred for a particular rule. 

French Abstract 

L'invention concerne un systeme permettant de detecter le debut d'une 
correspondance d'une expression reguliere, comprenant une table d'etat 
special contenant des entrees de debut et des entrees de fin, et une 
serie de registres d'etat de debut servant a contenir des informations 
d'ecart. Le systeme comprend egalement une table d'etat suivant DFA 
(automate deterministe a etats finis) qui, selon I'etat reel et un 
caractere entre, retourne I'etat suivant. La table d'etat suivant DFA 
comprend un indicateur reglable correspondant a chaque entree de table 
d'etat suivant indiquant s'il faut effectuer une recherche dans la table 
d'etat special. Ce systeme comprend eqalement un compilateur permettant 
de charger des valeurs dans la table d'etat special sur la base de 
1 'expression reguliere. L'invention concerne egalement un precede 
permettant de detecter le debut d'une correspondance d'une expression 
reguliere a Vaide de la table d'etat special, de 1 'ensemble de registres 
d'etat de debut et de la table d'etat suivant dfa, consistant a detecter 
a parti r de 1 'expression reguliere chaque etat de debut de correspondance 
et chaque etat final de correspondance de fin. Pour chaque etat de debut, 
une entree d'etat de debut est chargee dans la table d'etat special. Pour 
chaque etat final, une entree d'etat final est chargee dans chaque table 
d'etat special. La table d'etat suivant est utilisee pour retourner 
I'etat suivant a parti r de I'etat reel et un caractere d' entree. 
Lorsqu'un etat de debut est rencontre, le decalage reel par rapport au 
debut de la chaine de caracteres d' entree est charge dans le registre 
d'etat de debut. Lorsqu'un etat final est rencontre, 1 'entree d etat 
final est recuperee de la table d'etat special, et la valeur du registre 
d'etat de debut correspondant au numero de regie de 1 'entree finale dans 
la table d'etat special est egalement recuperee. La valeur du registre 
d'etat de debut qui est recuperee indique I'emplacement dans la chaine de 
caracteres ou le debut de correspondant s'est produit pour une regie 
parti culi ere. 
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Detailed Description 

Claims 

Full text word Count: 12630 
English Abstract 

Embodiments of the invention provide a programmable FSA building block, 
having a number of programmable registers and associated logic 
implemented therein, that provide the capability of contextually 
evaluating complex REs of arbitrary size against multiple data streams. 
Embodiments of the invention provide fully programmable hardware in which 
all of the states of an RE are instantiated and all of the states are 
fully connected. For one embodiment, the building blocks have a fixed 
number of states to facilitate implementation on a chip. For such an 
embodiment, an RE having an excessive number of states is implemented on 
two or more FSA building blocks and the FSA building blocks are then 
stitched together to effect evaluation of the RE. For one embodiment, two 
or more REs having a number of states less than the fixed number of 
states of a building block may be implemented with a single building 
block. 

French Abstract 



La presente invention, dans divers modes de realisation, a trait a un 
bloc fonctionnel d'automates d'etats finis programmable, comportant une 
pluralite de registres programmables et une logique associee qui y sont 
executes, fourmssant la capacite d'evaluer en contexte des expressions 
regulieres complexes de taille arbitraire par rapport a de multiples 
trains de donnees. Les modes de realisation de 1 'invention fournissent un 
materiel entierement programmable dans lequel sont instancies tous les 
etats d'une expression reguliere et tous les etats sont entierement 
. relies. Dans un mode de realisation, les blocs fonctionnel s ont un nombre 
fixe d'etats afin de faciliter 1 'execution sur une puce. Pour un tel mode 
de realisation, une expression reguliere presentant un nombre excessif 
d'etats est execute sur au moins deux blocs fonctionnels d'automate 
d'etats finis et les blocs fonctionnels d'automates d'etats finis sont 
ensuite lies ensemble pour realiser une evaluation de 1 'expression 
reguliere. Dans un mode de realisation, au moins deux expressions 
regulieres presentant un nombre d'etats inferieur au nombre d'etats fixe 
d'un bloc fonctionnel peuvent etre executes avec un bloc fonctionnel 
unique- 
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Detailed Description 

Figure 3, may be used to implement a state machine architecture for 
realization of a non - deterministic finite state automata with R 
nodes, R symbols, and R A2 arcs. In Figure 3, R has been... 

...FSA building block described above can be used to realize fast and 
efficient implementations of non - deterministic finite state 
automata ( NFA ) in hardware. The specification of an nfa naturally maps 
to the apparatus. Since regular expressions... 

...Techniques, and Tools" by Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman]. 
Notable algorithms include Thompson 's construction and the Berry-Sethi 
construction. These algorithms map a regular expression comprising of... 
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Detailed Description 

Claims 

Full text word Count: 9443 
English Abstract 

A method and apparatus for efficient implementation and evaluation of 
state machines and programmable finite state automata is described, in 
one embodiment, a state machine architecture comprises a plurality of 
node elements, wherein each of the plurality of node elements represents 
a node of a control flow graph. The state machine architecture also 
comprises a plurality of interconnections to connect node elements, a 
plurality of state transition connectivity control logic to enable and 
disable connections within the plurality of interconnections to form the 
control flow graph with the plurality of node elements, and a plurality 
of state transition evaluation logic coupled to the interconnections and 
operable to evaluate input data against criteria, the plurality of state 
transition evaluation logic to control one or more state transitions 
between node elements in the control flow graph. 

French Abstract 

L'invention concerne un procede et un dispositif pour la mise en oeuvre 
et 1 'evaluation efficace d' automates finis et d' automates finis 
programmables. Selon une variante, on decrit une architecture d'automate 
fim qui comprend plusieurs elements de noeud, chacun de ces elements 
representant un noeud de graphe de flux de commande. l' architecture 
comprend aussi plusieurs interconnexions pour la connexion des elements 
de noeud, plusieurs logiques de commande de connectivite de transition 
d'etat pour 1 'activcation et la desactivation de connexions dans les 
interconnexions considerees, visant a etablir le graphe de commande de 
flux avec la pluralite d'elements de noeud, et plusieurs logiques 
d'evaluation de transition d'etat couplees aux interconnexions et 
permettant d'evaluer les donnees d'entree par rapport a des criteres, les 
logiques d'evaluation de transition d'etat controlant une ou plusieurs 
transitions d'etat entre les elements de noeud dans le graphe de flux de 
commande . 
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Detailed Description 

one embodiment can accommodate several thousand state 
machines (each comprised of, for example, 16-state non -deterministic 
finite state automata) on a single chip. 

[00421 Figure 2 illustrates a sample embodiment. . .illustrates one 
embodiment of the state machine architecture, as tailored for the 
realization of non- deterministic finite state automata and for 
the parallel evaluation of multiple regular expressions on input data. 
Figure 4 shows ... 

...b) shows the embodiment of the architecture for realization of a state 
machine for a non - deterministic finite state automata with R 
nodes, R symbols, and R a2 arcs, in Figure 3(b), R = 3... 
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...a language is described by a regular expression, one can construct a 
nondeterministic finite automaton ( NFA ) which will recognize that 
language. There are well-known algorithms for constructing such NFA . This 
paper surveys existing algorithms and presents two new constructions which 
are designed both to produce small — near minimal — NFA and to do so 
efficiently. The authors* first new construction allows 
$\epsilon$-transitions, while the... 

...don't count). The first construction uses $0(\vert \alpha\vert )$ time 



1 



to generate an NFA of size $\leq {3\over 2}\vert \alpha\vert +{5\over 
2}$, where the size of an NFA is the sum of the number of states and 
number of transitions. They show that... 

...construction is near optimal by producing a family of regular 
expressions for which every recognizing NFA has size $\geq {4\over 
3}\vert \alpha\vert +{l\over 3}$. 
The elimination of... 

...and space $0(\vert \alpha\vert \sp 2)$. 

The authors compare the size of the NFA produced by their algorithm 
with other well-known constructions, and give several examples of regular 
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From regular expressions to dfa 's using compressed nfa 's. 

Summary: ' *There are two principal methods for turning regular 
expressions into NFA 's — one due to McNaughton and Yamada and another due 
to Thompson . unfortunately, both have drawbacks. Given a regular 
expression $R$ of length $r$ with $s$ occurrences... 

...r)$ space algorithms to produce a $\Theta(m)$ space representation of 
McNaughton and Yamada' s NFA with $s+l$ states and $m$ transitions. The 
problem with this NFA is that $m=\Theta(s\sp 2)$ in the worst case. 
Thompson 's method takes $\Theta(r)$ time and space to construct a 
$\Theta(r)$ space NFA with $\Theta(r)$ states and $\Theta(r)$ 
transitions. The problem with this NFA is that $r$ can be arbitrarily 
larger than $s$. 

*we overcome the drawbacks of both... 

...s)$ space algorithm to construct an $0(s)$ space representation of 
McNaughton and Yamada *s NFA . Given any set $v$ of NFA states, our 
representation can be used to compute the set $u$ of states one transition 



...v$ in optimal time $0(\vert v\vert+\vert u\vert)$. McNaughton and 
Yamada' s NFA requires $\Theta(\vert v\vert\times\vert u\vert)$ time in 
the worst case, using Thompson 's NFA , the equivalent calculation 
requires $\Theta(r)$ time in the worst case, comparative benchmarks show 
that an implementation of our method outperforms implementations of 
competing methods with respect to time for NFA construction, NFA 
acceptance testing, and NFA -to- DFA conversion by subset construction. 

Throughout this paper program transformations are used to design 
algorithms and. . . 
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58 0 S2 AND S3 AND THOMPSON? 
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Treatment: Practical (P) ; Theoretical (T) 

Abstract: There are two principal methods for turning regular expressions 
into NFAs-one due to R. McNaughton and H. Yamada (1960) and another due to 
K. Thompson (1968). Unfortunately, both have drawbacks. Given a regular 
expression r of length r and with s occurrences of alphabet symbols, Chang 
and Paige (1992) and Bruggemann-Klein (1993) gave Theta (m+r) time and 0(r5 
space algorithms to produce a Theta (m) space representation of McNaughton 
and Yamada 's NFA with s+1 states and m transitions. The problem with this 
NFA is that m= Theta (s/sup 2/) in the worst case. Thompson 's method 
takes Theta (r) time and space to construct a Theta (r) space NFA with 
Theta (r) states and Theta (r) transitions. The problem with this NFA is 
that V can be arbitrarily larger than s. We overcome drawbacks of both 
methods with a Theta (r) time Theta (s) space algorithm to construct an 
0(s) space representation of McNaughton and Yamada 's NFA . Given any set v 
of NFA states, our representation can be used to compute the set U of 
states one transition away from the states in V in optimal time 0(; v; +; u 
; ). McNaughton and Yamada 's NFA requires Theta (; V; x; U; ) time in the 
worst case, using Thompson 's NFA , the equivalent calculation requires 
Theta (r) time in the worst case. Comparative benchmarks show that an 
implementation of our method outperforms implementations of competing 
methods with respect to time for NFA construction, NFA accepting 
testing, and NFA -to- DFA conversion by subset construction. Throughout 
this paper program transformations are used to design algorithms and derive 
programs. A transformation of special importance is a form of finite 
differencing used previously by D. Smith to improve the efficiency of 
functional programs. (26 Refs) 
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Language: English Document Type: Journal Paper (JP) 
Treatment: Theoretical (T) 

Abstract: Thompson (1968) introduced an innovative method for obtaining 
non - deterministic finite state automata ( nfa ) from regular 
expressions. His formulation of nfas makes use of epsilon -transitions 
(null symbol input) and requires in the worst case 2 sigma +2 OPS states, 
where sigma is the number of occurrences of alphabet symbols and OPS is the 
number of operands in the oriqinal regular expression, we modify this 
algorithm to obtain a nfa M without epsilon -transitions that has in the 
worst case sigma +1 states. Using multi -branch expression trees to store 
the regular expressions efficiently, the algorithm presented is directly 
paralle I izable. The algorithm necessitates that we maintain a finite state 
automata which has no epsilon -transitions and has a starting node of zero 
in-degree. The role of epsilon -transitions in finite state automata is 
examined and, based on the technique of bypassing, two alternative 
approaches are suggested. (15 Refs) 
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Non-deterministic finite automata are used in applications, like scanners 
and editors, that use pattern matching. Thompson 's construction is an 
algorithm that will generate a non-deterministic finite automaton given a 
regular expression. The author takes a fresh look at this construction 
method. He specifies and implements it using an attribute grammar, and 
using a scanner and parser generator, he builds recognizer generators that 
can generate a recognizer for a given regular expression. The recognizer 
uses a backtracking algorithm to determine whether a string matches the 
regular expression. He also considers and solves the problem of regular 
expressions that cause the recognizer to loop. 
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Abstract: Program transformations constitute a form of finite 
differencing to improve the efficiency of functional programs and are used 
to design and derive programs. There are two methods to transform regular 
expressions into NFA 's, one due to McNaughton and Yamada and the other 
due to Thompson . Given regular expressions R of length r and with s 
occurrences of alphabet symbols, Theta (m plus r) time and 0(r) space 
algorithms are derived to produce Theta (m) space representation of 
McNaughton and Yamada's NFA with s plus 1 states and m transitions. 
However, the drawback of this NFA is that m equals Theta (s**2) in the 
worst case. Thompson 's method takes Theta (r) time and space to construct 
a Theta (r) space and Theta (r) transitions but r can be arbitrarily larger 
than s. 26 Refs. 
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Language: English 
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Journal Announcement: 9404W5 

Abstract: Thompson introduced an innovative method for obtaining non - 
deterministic finite state automata ( nfa ) from regular 
expressions. His formulation of nfas makes use of epsilon -transitions 
(null symbol point). We modify this algorithm to obtain a nfa M without 
epsilon -transitions that has in the worst case delta plus 1 states. Using 
multi-branch expression trees to store the regular expressions efficiently, 
the algorithm presented here is directly parallizable. The algorithm 
necessitates that we maintain a finite state automata which has no epsilon 
-transitions and has a starting node of zero in-degree. (Edited author 
abstract) 15 Refs. 
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We show how to turn a regular expression R of length r into an 0(s) 
space representation of McNaughton and Yamada's NFA , where s is the 
number of occurrences of alphabet symbols in R, and s + 1 is the number of 
NFA states. The standard adjacency list representation of McNaughton and 
Yamada's NFA takes up 1 + 2s + $s\sp2$ space in the worst case. The 
adjacency list representation of the NFA produced by Thompson takes up 
between 2r and 6r space, where r can be arbitrarily larger than s. Given 
any subset V of states in McNaughton and Yamada's NFA , our representation 
can be used to compute the set u of states one transition away from the 
states in v in optimal time 0($\vert v\vert + \vert U\vert$). McNaughton 
and Yamada's NFA requires $\Theta$($\vert v\vert \times \vert U\vert$) 
time in the worst case, using Thompson 's NFA , the equivalent 
calculation requires $\ThetaHr) time in the worst case. 

An implementation of our NFA representation confirms that it takes 
up an order of magnitude less space than McNaughton and Yamada's machine. 
An implementation to produce a DFA from our NFA representation by 
subset construction shows linear and quadratic speedups over subset 
construction starting from both Thompson 's and McNaughton and Yamada's 
NFA 's. It also shows that the DFA produced from our NFA is as much as 
one order of magnitude smaller than DFA 's constructed from the two other 
NFA 's. 

An UNIX egrep compatible software called cgrep based on our NFA 
representation is implemented. A benchmark shows that cgrep is dramatically 
faster than both UNIX egrep and GNU e?grep. 

Throughout this thesis the importance of syntax is stressed in the 
design of our algorithms. In particular, we exploit a method of program 
improvement in which costly repeated calculations can be avoided by 
establishing and maintaining program invariants. This method of symbolic 
finite differencing has been used previously by Douglas Smith to derive 
efficient functional programs. 
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Copyright (c) 1997 Elsevier Science B.V. All rights reserved. There are 
two principal methods for turning regular expressions into NFA 's - 
one due to McNaughton and Yamada and another due to Thompson . 
Unfortunately, both have drawbacks. Given a regular expression R of length 
r and with s occurrences of alphabet symbols, Chang and Paige (1992) and 
Brueggemann-Klein (1993) gave &Tgr;(m+r) time and 0(r) space algorithms to 
produce a &Tgr;(m) space representation of McNaughton and Yamada's 

NFA with s+1 states and m transitions. The problem with this NFA is 
that m=&Tgr;(s SUP 2 ) in the worst case. Thompson 's method takes 
&Tgr;(r) time and space to construct a &Tgr;(r) space NFA with &Tgr;(r) 
states and &Tgr;(r) transitions. The problem with this NFA is that r can 
be arbitrarily larger than s. we overcome drawbacks of both methods with a 
&Tgr;(r) time &Tgr;(s) space algorithm to construct an 0(s) space 
representation of McNaughton and Yamada's NFA . Given any set v of 

NFA states, our representation can be used to compute the set U of states 
one transition away from the states in v in optimal time 
0(&verbar ;V&verbar;+&verbar ;u&verbar ;) . McNaughton and Yamada's NFA 
requires &Tgr; (&verbar;v&verbar ;x&verbar ;U&verbar;) time in the worst case, 
using Thompson 's NFA , the equivalent calculation requires 
&Tgr;(r) time in the worst case. Comparative benchmarks show that an 
implementation of our method outperforms implementations of competing 
methods with respect to time for NFA construction, NFA accepting 
testing, and NFA -to- DFA conversion by subset construction. Throughout 
this paper program transformations are used to design algorithms and derive 
programs, A transformation of special importance is a form of finite 
differencing used previously by Douglas Smith to improve the efficiency of 
functional programs . 

Copyright (c) 1997 Elsevier Science B.V. All rights reserved. 
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claims 

Fulltext word Count: 8076 
English Abstract 

A system for determining the start of a match of a regular expression 
includes a special state table that contains start entries and terminal 



entries, and a set of start state registers for holding offset 
information. The system further includes a DFA next state table that, 
given the current state and an input character, returns the next state. A 
settable indicator is included in the DFA next state table corresponding 
to each next state table entry which indicates whether to perform a 
lookup in the special state table. A compiler loads values into the 
special state table based on the regular expression. A method for 
determining the start of a match of a regular expression using the 
special state table, the set of start state registers and the dfa next 
state table, includes the step of determining from the regular expression 
each start-of match start state and each end-of-match terminal state. For 
each start state, a start state entry is loaded into the special state 
table. For each terminal state, a terminal state entry is loaded into 
each special state table. The next state table is used to return the next 

state from the current state and an input character, when a start state 
is encountered, the current offset from the beginning of the input 
character string is loaded into the start state register, when a terminal 
state is encountered, the terminal state entry is retrieved from the 
special state table, and the value of the start state register 
corresponding to the rule number of the terminal entry in the special 
state table is further retrieved. The value of the start state register 
which is retrieved indicates the location in the character string where 
the start-of-match occurred for a particular rule. 

French Abstract 

L'invention concerne un systeme permettant de detecter le debut d'une 
correspondance d'une expression reguliere, comprenant une table d'etat 
special contenant des entrees de debut et des entrees de fin, et une 
serie de registres d'etat de debut servant a contenir des informations 
d'ecart. Le systeme comprend egalement une table d'etat suivant dfa 
(automate deterministe a etats finis) qui, selon I'etat reel et un 
caractere entre, retourne Vetat suivant. La table d'etat suivant dfa 
comprend un indicateur reglable correspondant a chaque entree de table 
d'etat suivant indiquant s'il faut effectuer une recherche dans la table 
d'etat special. Ce systeme comprend egalement un compilateur permettant 
de charger des valeurs dans la table d'etat special sur la base de 
1 'expression reguliere. L'invention concerne egalement un procede 
permettant de detecter le debut d'une correspondance d'une expression 
reguliere a I'aide de la table d'etat special, de 1 'ensemble de registres 
d'etat de debut et de la table d'etat suivant DFA, consistant a detecter 
a parti r de 1 'expression reguliere chaque etat de debut de correspondance 
et chaque etat final de correspondance de fin. Pour chaque etat de debut, 
une entree d'etat de debut est chargee dans la table d'etat special. Pour 
chaque etat final, une entree d'etat final est chargee dans chaque table 
d'etat special. La table d'etat suivant est utilisee pour retourner 
I'etat suivant a parti r de I'etat reel et un caractere d'entree. 
Lorsqu'un etat de debut est rencontre, le decalage reel par rapport au 
debut de la chaine de caracteres d'entree est charge dans le reqistre 
d'etat de debut. Lorsqu'un etat final est rencontre, 1 'entree d etat 
final est recuperee de la table d'etat special, et la valeur du registre 
d'etat de debut correspondant au numero de regie de 1 'entree finale dans 
la table d'etat special est egalement recuperee. La valeur du registre 
d'etat de debut qui est recuperee indique I'emplacement dans la chaine de 
caracteres ou le debut de correspondant s'est produit pour une regie 
particuliere. 
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